linear layer
Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration
The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, showing that deviations in class probabilities are controlled by the spectral error of the compressed weights. Second, we demonstrate that RSVD is inadequate, and we propose randomized subspace iteration (RSI) as a more effective alternative. By incorporating multiple power iterations, RSI improves spectral separation and provides a controllable mechanism for enhancing approximation quality. We evaluate our approach on both convolutional networks and transformer-based architectures. Our results show that RSI achieves near-optimal approximation quality while outperforming RSVD in predictive accuracy under aggressive compression, enabling efficient model compression.
- North America > United States > Colorado > Denver County > Denver (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Europe > Germany (0.04)
- Transportation > Ground > Road (0.34)
- Information Technology > Robotics & Automation (0.34)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Data Science (0.92)
- Information Technology > Artificial Intelligence > Vision (0.67)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (5 more...)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- North America > United States > Minnesota (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Education (0.93)
- Information Technology > Software (0.46)
- North America > United States > Texas > Travis County > Austin (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Arizona (0.04)
- (2 more...)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
A Appendix
In the following subsections, we provide theoretical derivations. In this subsection, we provide a formal description of the consistency property of score matching. Assumption A.4. (Compactness) The parameter space is compact. Assumption A.5. (Identifiability) There exists a set of parameters A.3 are the conditions that ensure A.7 lead to the uniform convergence property [ In the following Lemma A.9 and Proposition A.10, we examine the sufficient condition for We show that the sufficient conditions stated in Lemma A.9 can be satisfied using the Figure A1: An illustration of the relationship between the variables discussed in Proposition 4.1, Lemma A.12, and Lemma A.13. The properties of KL divergence and Fisher divergence presented in the last two rows are derived in Lemmas A.12 In this section, we provide formal derivations for Proposition 4.1, Lemma A.12, and Lemma A.13. Based on Remark A.14, the following holds: D In this section, we elaborate on the experimental setups and provide the detailed configurations for the experiments presented in Section 5 of the main manuscript.
- Asia > Taiwan (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
- North America > United States (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.47)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)